Intermediate-layer DNN adaptation for offline and session-based iterative speaker adaptation
نویسندگان
چکیده
In this work we present intermediate-layer deep neural network adaptation (DNN) techniques upon which we build offline as well as iterative speaker adaptation for online applications. We motivate our online work for task completion in Microsoft personal voice assistant, where we present different adaptation styles in a speech session e.g., (a) adapt the speakerindependent (SI) model on the current utterance, (b) recursively adapt an incremental speaker-dependent (SD) model in the session for just the previous utterance, (c) adapt the SI model for all past utterances in the session. We considered a number of adaptation techniques and demonstrated that the intermediatelayer approach with inserting-and-adapting a linear layer on top of an intermediate singular-value-decomposition layer provides the best results for offline adaptation, where we obtained respectively 22.6% and 12% relative reduction in word-errorrate (WER) for supervised and unsupervised adaptation on 100utterances. An alternative intermediate-layer recursive adaptation in a 5-utterances session provided 6% relative-reduction in WER for online applications.
منابع مشابه
Multi-Attribute Factorized Hidden Layer Adaptation for DNN Acoustic Models
Recently, the Factorized Hidden Layer (FHL) adaptation is proposed for speaker adaptation of deep neural network (DNN) based acoustic models. In addition to the standard affine transformation, an FHL contains a speaker-dependent (SD) transformation matrix using a linear combination of rank-1 matrices and an SD bias using a linear combination of vectors. In this work, we extend the FHL based ada...
متن کاملA study of speaker adaptation for DNN-based speech synthesis
A major advantage of statistical parametric speech synthesis (SPSS) over unit-selection speech synthesis is its adaptability and controllability in changing speaker characteristics and speaking style. Recently, several studies using deep neural networks (DNNs) as acoustic models for SPSS have shown promising results. However, the adaptability of DNNs in SPSS has not been systematically studied....
متن کاملAn Investigation of Deep Neural Networks for Multilingual Speech Recognition Training and Adaptation
Different training and adaptation techniques for multilingual Automatic Speech Recognition (ASR) are explored in the context of hybrid systems, exploiting Deep Neural Networks (DNN) and Hidden Markov Models (HMM). In multilingual DNN training, the hidden layers (possibly extracting bottleneck features) are usually shared across languages, and the output layer can either model multiple sets of l...
متن کاملEnsemble speaker modeling using speaker adaptive training deep neural network for speaker adaptation
In this paper, we introduce an ensemble speaker modeling using a speaker adaptive training (SAT) deep neural network (SAT-DNN). We first train a speaker-independent DNN (SIDNN) acoustic model as a universal speaker model (USM). Based on the USM, a SAT-DNN is used to obtain a set of speaker-dependent models by assuming that all other layers except one speaker-dependent (SD) layer are shared amon...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015